Computational conjugate adaptive optics microscopy for longitudinal through-skull imaging of cortical myelin

Myelination processes are closely related to higher brain functions such as learning and memory. While their longitudinal observation has been crucial to understanding myelin-related physiology and various brain disorders, skull opening or thinning has been required to secure clear optical access. Here we present a high-speed reflection matrix microscope using a light source with a wavelength of 1.3 μm to reduce tissue scattering and aberration. Furthermore, we develop a computational conjugate adaptive optics algorithm designed for the recorded reflection matrix to optimally compensate for the skull aberrations. These developments allow us to realize label-free longitudinal imaging of cortical myelin through an intact mouse skull. The myelination processes of the same mice were observed from 3 to 10 postnatal weeks to the depth of cortical layer 4 with a spatial resolution of 0.79 μm. Our system will expedite the investigations on the role of myelination in learning, memory, and brain disorders.


Decorrelation time due to the animal motion in the in vivo imaging
In our animal preparation procedure, a coverslip of 5 mm diameter was attached to the center of the parietal bone using an ultraviolet-curable glue. A custom-made metal plate was attached to the skull with cyanoacrylate for head fixation during the in vivo imaging, and the exposed part of the skull was covered with dental cement. Therefore, our system can be considered as a 'significantly immobilized' situation in the reference paper 1 .
We measured the decorrelation time in the in vivo imaging. The acquisition time for a single-depth reflection matrix is as long as 10.4 s in our experiment. Therefore, we measured the image decorrelation for 30 s. Specifically, we illuminated the focused beam at a depth of 70 μm beneath the dura and acquired a series of complex-field images over the field of detection of 75 × 75 μm at a frame rate of 2,000 Hz.
We calculated the intensity correlation defined by the formula, . (S1) Here and corresponds to the intensity of first image ( ) and that of the image after time ( + ), respectively, and ̅ and are their respective mean values. The and refer to the m th and n th pixels, respectively, in each image. This formula is the same as the function ( ) introduced in the reference paper 1 . Supplementary Fig. 2 shows that a high correlation is maintained during the time of 30 s with a minimum correlation value of 0.78. The regular peaks in the plot were due to the breathing of the mouse. The correlation value remained higher than 0.85 for the time duration of 10.4 s, which is the single-depth recording time for 160 µm 2 ×160 μm 2 . This result supports that the tissue make little effect on our in vivo imaging.
Supplementary Figure 2. Image correlation with time. The correlation r was calculated from a set of timegated images acquired for 30 s while the focused illumination was parked at a depth of 70 μm beneath the dura. The correlation value stayed higher than 0.78 over the entire measured time span.

The choice of the subregion size
The major advantage of the conjugate AO over pupil AO in the through-skull imaging is its larger isoplanatic patch size. For the same experimental data, we found that the isoplanatic size of the pupil-CLASS was around 15 × 15 μm while that of our conjugate-CLASS was around 80 × 80 μm ( Supplementary Fig. 8). Therefore, the size of each subregion to be analyzed can be as large as 80 × 80 μm . Since the recorded ROI was 160 × 160 μm , we can divide the entire ROI into 4 × 4 subregions with a 50 % overlapping ratio for optimal visibility. In this case, the computation time for reconstructing the entire ROI was 2,386 s. Note that the computation time is spent on the postprocessing and, thus, does not affect to the in-vivo imaging. In the actual data analysis, we chose the subregion size of 64 × 64 μm and divided the entire ROI into 5 × 5 subregions, which resulted in a reduced computation time of 1557 s. In fact, as the patch size decreases and, thus, the number of patches increases, the total computation time gets shorter because the computation time for each patch scales with a power of 2 with respect to the area of the patch as shown in Supplementary Fig. 3. Since we processed the data over 140 to 170 depths in the volumetric image, this difference in computation time is a critical factor. We didn't reduce the subregion size to smaller than 64 × 64 μm to avoid image fragmentation in merging multiple images.

Comparison between input and output aberration maps
The and should be the same due to the double-pass geometry and reciprocity. However, they can slightly be different due to experimental imperfections such as the slight misalignment between the illumination and detection pathways. Oftentimes, this misalignment is necessary to avoid direct back-reflection from the optics. In Fig. 2d, we showed only the output aberration maps as the input aberration maps are similar. Here we showed both the input and output aberration maps for a few different depths. The patterns have high similarity with their correlation value of 0.75 or higher.
Supplementary Figure 4. Typical input and output aberration maps. a, Aberration maps at a depth of 60 μm beneath the dura. The correlation between the input and output aberration maps was 0.75. b, Same as a, but at a depth of 150 μm beneath the dura. Correlation value: 0.79. c, Same as a, but at a depth 450 μm beneath the dura. Correlation value: 0.80. Scale bars, 80 μm. Color bar, phase in radians.

Size of the aberration map in the conjugate-CLASS
The size of the scale bar in Fig. 2d varies with depth. This is because the size of the aberration map identified by the conjugate-CLASS increases with depth due to the illumination/detection geometry. As shown in Supplementary Fig. 5, the area in the skull plane responsible for the focused illumination is enlarged with the increase of the distance between the objective focus and the conjugate plane. The numerical aperture of the objective lens is another factor determining the aberration map size. In our experimental configuration, the diameter of the aberration map was given approximately by D~1.2z.
Supplementary Figure 5. The size of aberration maps at the skull plane. a, Schematic of two focused beams illuminating on the two ends of ROI at the sample plane through the skull. L : the size of ROI in one axis, which is typically 160 μm in the experiment. Θ : the angle of the cone of the illumination beam. This can be calculated by NA = n × sinθ , where NA is the numerical aperture and n is the average refractive index of the medium. z: the distance from the sample plane to the skull plane. R: radius of the focused illumination in the skull plane, which can be calculated by R = z × tan θ . As a result, the diameter of aberration map D can be obtained by D = L + 2R. b, The diameter of aberration map D depending on the imaging depth z. Relation between the diameter and the imaging depth was given by D~1.2z.

Theoretical framework of the conjugate-CLASS 2.1. Basis conversion
The experimentally measured electric field ( ; ) constituting needs to be converted to ( , ), the matrix element of for the application of the conjugate-CLASS algorithm. This is realized by the free-space propagation of detected/illumination fields from / to / . Let us suppose that the axial coordinate is = 0 for / , and that of / is > 0. An incident wave focused at is a diverging spherical wave at , which is expressed as ( − , ). Here ( , ) is the electric field of a point source emanating from the origin2: Therefore, for the focused illumination at , the electric field at is written as the superposition of ( ; ): The next step is to propagate the electric field measured at to located at -z in the reflection geometry. Since a point source emanating from is a diverging spherical wave ( − , ) at , we can find the electric field at by the convolution of ( ; ) with ( − , ): We obtain ( ; ) by the successive operations of Eq. (S2) and Eq. (S3) from ( ; ). Figures 1b and 1c show the schematic configurations of the space-domain reflection matrix and a conjugate-plane reflection matrix , respectively. The electric field ( ; ) constituting was described in Eq. (1) in the main text. Here, we present its detailed derivation. A focused illumination at experiences phase retardation ( ) by the skull and propagates to the sample plane at = ( , ) at a distance from the skull. Under Fresnel approximation, the electric field impinging on the sample plane is written as

Derivation of Eq. (1)
The electric field reflected by the object at the sample plane is given by ( ; ) = ( ) ( ; ), where ( ) is the amplitude reflectance of the object at the sample plane. This reflected wave propagates back to the conjugate plane and experiences phase retardation ( ) by the same skull. Therefore, the electric field of the backscattered wave at is expressed as By inserting Eq. (S2) into Eq. (S3), we obtain ( ; ) as Here, where ( ) is the amplitude reflectance of the target object at the sample plane. Since there exist multiple scattering backgrounds from the skull and brain tissues, the multiple-scattered wave ( ; ) should be added to Eq. (S7), which is the Eq. (1) in the main text.

Conjugate-CLASS algorithm
We developed a conjugate-CLASS algorithm that finds ( ), ( ), and from ( ; ) in Eq. (1). Here, we exploit the fact that the object spectrum . Since this resembles the shift-invariance of ( − ) with respect to in the pupil-CLASS, we used a similar iterative optimization engine as that of the pupil-CLASS. The algorithm consists of two steps, first finding ( ) and then ( ).

Finding the approximate ( )
For each , we compute the angle of the inner product 〈 ( ; ) * ( ; = 0)〉 to obtain the first estimate of ( ): Here, we assume ( = 0) = 0 since only the relative phase matters. The error term ( ) ( ) arises due to the presence of ( − ) and the multiple scattering term. We apply the correction of this first estimation by multiplying Once this correction is in place, we move to the next step to correct ( ) ) . We can ignore the terms Δ ( ) and Δ ( ) as they became smaller than a small tolerance level. We then replace with = + in ( ; ): ( ; ) ≅ − e ⁄ ( ⁄ ) + ( ; ). (S15) The summation of ( ; ) with respect to for orthogonal pixels leads to the object spectrum.
( ; ) ≅ − e ⁄ ( ⁄ ). (S16) The summation of ( ; ) is incoherent such that its magnitude grows with √ . Therefore, its contribution becomes smaller than the term with the object spectrum for a sufficiently large . Taking the inverse Fourier transform of Eq. (S16) results in the object function ( ).

Convergence condition
The convergence of the algorithm can be determined from the correlation of output waves between two incident wave vectors ( ) and ( ) , which can be expressed as where ( ) = . We ignored the cross terms since they are much smaller than the second term on the right-hand side. The factor ( ) ( ) in the first-term on the right-hand side is the input aberration that we are going to identify in this correlation, and the second term serve as a noise. The magnitude of first term are largely given by single-scattering intensity ≈ ( ) ( /z) and the normalized cross-correlation of output aberrations ≈ 〈 ( ) ( ) 〉 ( ) , where ( ) is the number of channel used to average over . Since the magnitude of is related to the degree of aberration, it gets smaller as the aberration becomes complex. The magnitude of the second term is given approximately by ( ). Hence, the fidelity of correction is determined by the ratio between the first and the second terms, = ( ⁄ ) ( ). The algorithm's convergence depends on the certain threshold value of that is given by single to multiple scattering intensity ratio ⁄ , the degree of aberration , and the number of channels ( ).

Comparison between conjugate-CLASS and pupil-CLASS 4.1 Isoplanatic patch size
Supplementary Figure 8. Comparison of isoplanatic patch size between pupil-CLASS and conjugate-CLASS. a, Images processed by the pupil-CLASS algorithm for the different ROI sizes with respect to the same center. The size was chosen from 20 × 20 μm to 80 × 80 μm . The signal enhancement was highest at the size of 30 × 30 μm , and the maximum intensity here was set to be 1. At ROIs larger than 50 × 50 μm , signal enhancement was reduced to 10-20 %. Furthermore, the intensity of the myelin segment was not even, and background noise was increased. At ROIs smaller than 40 × 40 μm , the signal enhancement was similar. However, the myelin segment was reconstructed only near the center of the ROI. From the length of the reconstructed myelin segment, the isoplanatic patch size was estimated to be 15 × 15 μm . The data used here was acquired at cortical layer 1 of the intact skull mouse brain. Scale bar, 20 μm. Color bars: normalized intensity with respect to the maximum value in 30 × 30 μm case. b, Aberration-corrected image by conjugate-CLASS algorithm at ROI size of 80 × 80 μm was cropped into the ROI sizes corresponding to those at a. Unlike pupil-CLASS, the conjugate-CLASS algorithm was able to reconstruct the myelin segment with uniform intensity and high contrast across the ROI of 80 × 80 μm . Color bar: normalized intensity.

Image quality
Supplementary Figure 9. Comparison of image quality between pupil-CLASS and conjugate-CLASS. a, Aberration-corrected images by the pupil-CLASS algorithm. Individual images are MIPs in the depth ranges of 41-56, 443-470, 515-539, and 548-578 μm. Each image was normalized by its maximum intensity. b, Aberration-corrected images by the conjugate-CLASS algorithm at the same depths as those shown in a. These images are the same as those in Fig. 3c in the main text. Scale bar, 40 μm. Pupil-CLASS was processed with the patch size of 20 × 20 μm , and the reconstructed images were overlapped with one another in the ratio of 50 %. Pupil-CLASS has degeneracy in the tilt and ambiguity in the defocus. (Supplementary section 4.4) This can give rise to the lateral and focal shifts of the reconstructed images. For example, the optimization can converge to the neighboring depth where there is a stronger signal than the one at the objective focus. Therefore, the same myelin segment can be reconstructed across multiple depths with different lateral shifts. This causes the blur of myelin segments in MIP images, as indicated by white arrowheads in a. Furthermore, the reconstruction fidelity is low when there is no myelin segment within the small patch. This is responsible for large non-uniform background noise.

Comparison of aberration maps from conjugate-CLASS and pupil-CLASS
Let us clarify the difference in the physical meaning of the aberration map between pupil-CLASS and conjugate-CLASS (Supplementary Fig. 10a). The aberration map obtained by the pupil-CLASS is the angle-dependent phase retardation. On the contrary, that obtained by the conjugate-CLASS is the local phase variations induced by the skull at the conjugate plane. Therefore, the fine phase patterns in the aberration map are associated with the fine spatial structures of the skull. The slowly varying phase pattern can be due to the slowly varying curvature of the skull. It is not legitimate to conduct the Zernike decomposition and find the defocus component directly from the conjugate-CLASS aberration map.
Supplementary Figure 10. Conversion of the aberration map at the skull plane to that in the pupil plane. a, Schematic of the wave propagation to the mouse brain through the skull. The conjugate-CLASS algorithm corrects the aberration for the dark blue region at the skull plane marked in the schematic. Therefore, the aberration map corresponds to the effective spatial phase retardation induced by the skull. On the contrary, pupil aberration marked with dark blue at the pupil plane in the schematic accounts for the angle-dependent phase retardation. b, conjugate-CLASS aberration maps acquired for three representative data. c, Pupil aberration maps obtained by converting the conjugate aberration maps in b. d, Pupil aberration maps after removing the defocus component, which is the fourth index of Zernike polynomial , in c. Color bar, phase in radians. e, Zernike coefficients of the pupil aberration in c.
component accounts for 0.55 %, 1.9 %, and 0.09 % of the total wavefront distortion from the top. Note that the Zernike polynomial used here follows the OSA/ANSI standard index.
Supplementary Figure 11. Strehl ratio depending on the depth before and after component subtraction. a, Strehl ratio before (red dots) and after (blue dots) component subtraction. The data is from Supplementary Movie 3. b, Same as a, but for the data in the main text Fig. 3. For data #2, we only took data in the cortical layers 1 and 4 to focus on the myelin analysis on those layers. c, Expected enhancement of the PSF intensity by the aberration correction for the mouse #1 data before (red dots) and after (blue dots). Enhancement was estimated by the inverse of the Strehl ratio. d, Same as c, but for data #2.
To find the defocus component, it is necessary to convert the aberration map obtained by the conjugate-CLASS into that in the pupil plane. This is done by the following procedures. The aberration map in the conjugate plane is applied to a focused illumination at the conjugate plane. This modified wave is propagated to the object plane, and its inverse Fourier transform is taken to obtain the aberration map in the pupil plane. Supplementary Fig. 10c shows thus obtained pupil aberration map from the conjugate aberration map in Supplementary Fig. 10b. We then perform the Zernike decomposition of the pupil aberration map (Supplementary Figs. 10d and e) and found that the defocus aberration accounts for less than 2 % of the total wavefront distortion. As such, the defocus correction made little effect on the Strehl ratio and, thus, the expected enhancement estimated by the inverse of the Strehl ratio ( Supplementary Fig. 11). Intuitively, the slowly varying lower-order aberration in the conjugate plane makes a negligible effect on the defocus to the spherical wave with a large spatial phase curvature converging to the object plane.

The effect of degeneracy in the tilt and ambiguity in the defocus
The original pupil-CLASS algorithm finds the aberration map that maximizes the total single scattering intensity of the reconstructed image. In this process, there can be multiple solutions that can provide the same total intensity. Suppose the aberration map has an additional phase-tilt component with respect to the ground truth map. Then, the object spectrum can have the opposite tilt component to counterbalance the tilt in the aberration map in such a way that the resulting reflection matrix is the same as the original reflection matrix. In other words, an aberration map with an overall phase tilt causing a lateral image shift could be another solution to the algorithm.
The tilt degeneracy can cause difficulty in merging reconstructed image patches. Due to the small patch size in the pupil-CLASS, many small image patches as large as 1616 should be merged to form an image over the full ROI. However, the unpredictable lateral image shift in each patch makes the merged image fragmented such that the reconstructed myelin fiber can be disconnected after merging. One can manually find and correct the tilt component in each aberration map. This is a substantial work by itself, and its precision cannot be guaranteed when the aberration map is too complex. The tilt degeneracy can also occur in the conjugate-CLASS algorithm, but this is not a big issue since the patch size is large enough to faithfully connect multiple patches.
In the case of defocus, it is an ambiguity rather than a degeneracy. When the myelin segments are located off from the objective focus, the algorithm tends to add a defocus in the aberration map to maximize the reconstructed image intensity. In this case, the focus of each patch can be shifted to the plane where the myelin is situated. This can also cause the fragmentation of the merged image in the pupil-CLASS especially when the myelin segment is slanted along the axial direction. The ambiguity in the defocus can also occur in the conjugate-CLASS, but the large isoplanatic patch size ensures the fidelity of merging subregion images.

Comparison between reflectance imaging and fluoromyelin-stained fluorescence imaging
Supplementary Figure 12. Comparison between the label-free reflectance image and the fluoromyelinstained fluorescence image. a, An ex-vivo mouse brain stained with the fluoromyelin dye (1:200 in phosphatebuffered saline, Invitrogen). b, Confocal reflectance image taken at cortical layer 1. Scale bar, 30 μm. c, Twophoton fluorescence image taken at the same region as b. The two-photon excitation wavelength was 1.05 m, and the center wavelength of the emission filter was 0.63 m. d, Merged image of b and c. Yellow arrows indicate the representative myelin fibers appearing in both images. This result validates that the fibrous structures in the label-free reflectance image are the myelin segments.